Complementarity between public and commercial databases: new opportunities in medicinal chemistry informatics.

نویسندگان

  • Christopher Southan
  • Péter Várkonyi
  • Sorel Muresan
چکیده

The last two years have seen a dramatic expansion in public cheminformatics, as exemplified by the approximate five-fold growth of PubChem from over 50 contributing data sources. Consequently, medicinal chemists who were hitherto limited to commercial databases now also have access to public sources that they can download and/or query directly over the Web. The range of public sources, particularly where they link out to structured bioinformatic and biological data, already offer utilities that have no commercial equivalent. This work reviews compound content comparisons between selected public and commercial databases that capture bioactive content. We focused particularly on those that specify relationships between compounds and their protein targets. Our stringent filtering produced lower unique compound numbers than those reported for individual databases and thereby facilitated standardised comparisons of content. The resultant matrix shows the pairwise comparison of each database and selected subsets. Overall, this showed an unexpected degree of non-overlap, thereby emphasising the complementarity gained from combining public and commercial sources. This conclusion is supported by a Venn-type analysis of GVKBIO, WOMBAT (both commercial) and PubChem (public). These databases show not only overlap but also unique bioactive content in each case because of their different strategies for source selection and data collection.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel Worlds of Public and Commercial Bioactive Chemistry Data

The availability of structures and linked bioactivity data in databases is powerfully enabling for drug discovery and chemical biology. However, we now review some confounding issues with the divergent expansions of public and commercial sources of chemical structures. These are associated with not only expanding patent extraction but also increasingly large vendor collections amassed via diffe...

متن کامل

Quantitative assessment of the expanding complementarity between public and commercial databases of bioactive compounds

BACKGROUND Since 2004 public cheminformatic databases and their collective functionality for exploring relationships between compounds, protein sequences, literature and assay data have advanced dramatically. In parallel, commercial sources that extract and curate such relationships from journals and patents have also been expanding. This work updates a previous comparative study of databases c...

متن کامل

A Systematic Review Opportunities and Challenges of Tele-cardiology in Health Care Systems

Aim: Telecardiology provides a useful diagnostic tool for accurate and rapid diagnosis of patients with cardiac disorders to specialist and general practitioners. In this study, a systematic review was conducted to identify effective components and approaches in Telecardiology, such as the opportunities and challenges of applying this system in different domains. Information sources or data: T...

متن کامل

Expanding opportunities for mining bioactive chemistry from patents

Bioactive structures published in medicinal chemistry patents typically exceed those in papers by at least twofold and may precede them by several years. The Big-Bang of open automated extraction since 2012 has contributed to over 15 million patent-derived compounds in PubChem. While mapping between chemical structures, assay results and protein targets from patent documents is challenging, the...

متن کامل

Information extraction in the life sciences: perspectives for medicinal chemistry, pharmacology and toxicology.

Information extraction approaches have been successfully applied to mine the scientific literature in biology and medicine. So far, the main focus of research and development in this domain was on the recognition and extraction of gene and protein names in the context of molecular biology and genome research and on disease names and other medical terms in the context of clinical research. Simil...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Current topics in medicinal chemistry

دوره 7 15  شماره 

صفحات  -

تاریخ انتشار 2007